Skip to content

Perf/incremental symbol cache#155

Open
shrutu0929 wants to merge 1 commit intoRefactron-ai:mainfrom
shrutu0929:perf/incremental-symbol-cache
Open

Perf/incremental symbol cache#155
shrutu0929 wants to merge 1 commit intoRefactron-ai:mainfrom
shrutu0929:perf/incremental-symbol-cache

Conversation

@shrutu0929
Copy link
Copy Markdown

@shrutu0929 shrutu0929 commented Apr 7, 2026

solve #149
scanning process to make it significantly faster and more resource-efficient. Previously, the system blindly scanned the entire project tree, often wasting valuable compute cycles indexing virtual environments, git histories, and cache directories. We resolved this by implementing a targeted directory filter that actively excludes irrelevant folders such as .git, .venv, pycache, and node_modules during the file discovery phase. In addition to minimizing the search area, we introduced a robust incremental caching mechanism that intelligently tracks each file's modification timestamp (mtime) using a new persistent metadata dictionary. Rather than executing an expensive, all-or-nothing cache reload, the builder now loads the existing memory table and precisely targets only the files that have been created, modified, or deleted since the last run. When a file change is detected, the old footprint—including any globally exported references—is cleanly purged before the newly updated contents are re-analyzed. This ensures that the application's underlying symbol table continuously runs strictly in sync with raw disk changes without forcing users to pay the performance cost of a full systemic rebuild.

Summary by CodeRabbit

  • Refactor
    • Improved cache invalidation to skip unchanged files and optimize analysis performance.
    • Enhanced file change detection to properly identify and handle modified and deleted files.
    • Increased cache persistence reliability with improved metadata tracking.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

The symbol table cache now implements incremental invalidation by tracking per-file modification times. Changed files are detected via metadata comparison, removed from the cache, and re-analyzed; deleted files are identified and pruned. The cache persists only when updates occur.

Changes

Cohort / File(s) Summary
Cache Metadata & Invalidation
refactron/analysis/symbol_table.py
Added metadata field to SymbolTable for tracking file modification times. Introduced remove_file() method for evicting file symbols and exports. Modified build_for_project() to scan files, compare modification times against cached metadata, invalidate changed files, detect and remove deleted files, and save cache only if updates occurred.

Sequence Diagram(s)

sequenceDiagram
    participant Builder as SymbolTableBuilder
    participant FS as Filesystem
    participant Cache as Cache Storage
    participant SymTable as SymbolTable
    
    Builder->>Cache: Load cached SymbolTable & metadata
    activate Cache
    Cache-->>Builder: Return cached data
    deactivate Cache
    
    Builder->>FS: Scan *.py files in project
    activate FS
    FS-->>Builder: Return file list & current mtimes
    deactivate FS
    
    loop For each scanned file
        alt File unchanged (mtime matches metadata)
            Builder->>SymTable: Skip (use cached symbols)
        else File changed or new
            Builder->>SymTable: remove_file(file_path)
            Builder->>FS: Re-analyze file
            Builder->>SymTable: Add updated symbols
            Builder->>SymTable: Update metadata[file_path]
        end
    end
    
    loop Detect deleted files
        Builder->>SymTable: Check cached symbols vs current files
        alt File no longer exists
            Builder->>SymTable: remove_file(file_path)
        end
    end
    
    alt Any updates made
        Builder->>Cache: Save updated SymbolTable with metadata
        activate Cache
        Cache-->>Builder: Cache persisted
        deactivate Cache
    else No updates
        Builder->>Builder: Skip cache save
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐰 Hop along the cache, dear files, with timestamps in tow,
Changed ones get whisked away, while unchanged ones flow,
Deleted paths are sniffed out fast with keen rabbit nose,
Save only when it matters—how efficiently it grows!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Perf/incremental symbol cache' directly and specifically refers to the main change in the PR: implementing incremental caching for the symbol table to improve performance.
Docstring Coverage ✅ Passed Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@shrutu0929 shrutu0929 force-pushed the perf/incremental-symbol-cache branch from 28f151f to 18c8108 Compare April 7, 2026 06:15
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
refactron/analysis/symbol_table.py (3)

180-180: Redundant getattr usage.

self.cache_dir is always initialized in __init__ (line 136), so getattr(self, "cache_dir", None) is unnecessary. Using self.cache_dir directly is cleaner and more consistent with line 142.

♻️ Suggested fix
-        if getattr(self, "cache_dir", None) and updated:
+        if self.cache_dir and updated:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` at line 180, The condition uses
getattr(self, "cache_dir", None) even though cache_dir is always set in
__init__; replace the getattr call with a direct attribute check (use
self.cache_dir) in the conditional inside the method where updated is checked
(the if statement currently written as getattr(self, "cache_dir", None) and
updated). Update the condition to use self.cache_dir and keep the rest of the
logic unchanged so it matches the style used elsewhere (e.g., the direct access
on line 142).

72-76: Unused loop variable symbol.

The symbol variable from the iteration is not used; only name is needed for the exports lookup. Static analysis flagged this (B007).

♻️ Suggested fix
         for scope, names in self.symbols[file_path].items():
             if scope == "global":
-                for name, symbol in list(names.items()):
+                for name in list(names.keys()):
                     if name in self.exports and self.exports[name].file_path == file_path:
                         del self.exports[name]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 72 - 76, The loop declares
an unused variable `symbol` in "for name, symbol in list(names.items())" which
triggers a linter warning; change the loop to only use the name (e.g., "for name
in list(names.keys())" or "for name, _ in list(names.items())") so the unused
variable is removed, keeping the existing logic that checks
self.exports[name].file_path == file_path and deletes from self.exports.

149-151: Line exceeds 100-character limit.

This list comprehension line appears to exceed the 100-character limit specified in the coding guidelines.

♻️ Suggested fix
-        python_files = [
-            f for f in all_python_files if not any(excluded in f.parts for excluded in excluded_dirs)
-        ]
+        python_files = [
+            f for f in all_python_files
+            if not any(excluded in f.parts for excluded in excluded_dirs)
+        ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 149 - 151, The list
comprehension assigning python_files is over 100 chars; split the expression
across multiple lines or extract the condition into a named helper to stay
within the limit. For example, move the any(...) check into a small function or
a separate variable (e.g., is_excluded = lambda p: any(excluded in p.parts for
excluded in excluded_dirs)) and then build python_files = [f for f in
all_python_files if not is_excluded(f)] so the line lengths for python_files,
the condition, and the any(...) call are each under 100 characters; references:
python_files, all_python_files, excluded_dirs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@refactron/verification/checks/syntax.py`:
- Around line 1-26: The new fast-path must be merged into the existing
SyntaxVerifier without changing the BaseCheck contract: update the existing
SyntaxVerifier.verify method to keep the signature verify(self, original: str,
transformed: str, file_path: Path) -> CheckResult, and implement the
VerificationContext optimization inside that method by retrieving or accepting a
VerificationContext instance from the surrounding engine (use the existing
engine/context helper rather than adding a new optional parameter); if a cached
context indicates transformed_ast is None and transformed.strip() != "" return a
failing CheckResult, otherwise attempt ast.parse(transformed) and return the
appropriate CheckResult on parse failure or success; also wrap/shorten the long
import/type line to <=100 chars and ensure you reference the types
VerificationContext and CheckResult where required.

---

Nitpick comments:
In `@refactron/analysis/symbol_table.py`:
- Line 180: The condition uses getattr(self, "cache_dir", None) even though
cache_dir is always set in __init__; replace the getattr call with a direct
attribute check (use self.cache_dir) in the conditional inside the method where
updated is checked (the if statement currently written as getattr(self,
"cache_dir", None) and updated). Update the condition to use self.cache_dir and
keep the rest of the logic unchanged so it matches the style used elsewhere
(e.g., the direct access on line 142).
- Around line 72-76: The loop declares an unused variable `symbol` in "for name,
symbol in list(names.items())" which triggers a linter warning; change the loop
to only use the name (e.g., "for name in list(names.keys())" or "for name, _ in
list(names.items())") so the unused variable is removed, keeping the existing
logic that checks self.exports[name].file_path == file_path and deletes from
self.exports.
- Around line 149-151: The list comprehension assigning python_files is over 100
chars; split the expression across multiple lines or extract the condition into
a named helper to stay within the limit. For example, move the any(...) check
into a small function or a separate variable (e.g., is_excluded = lambda p:
any(excluded in p.parts for excluded in excluded_dirs)) and then build
python_files = [f for f in all_python_files if not is_excluded(f)] so the line
lengths for python_files, the condition, and the any(...) call are each under
100 characters; references: python_files, all_python_files, excluded_dirs.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e13da794-0cb8-41b3-80f4-1ed88f9154ae

📥 Commits

Reviewing files that changed from the base of the PR and between a9659f5 and 28f151f.

📒 Files selected for processing (4)
  • refactron/analysis/symbol_table.py
  • refactron/verification/checks/imports.py
  • refactron/verification/checks/syntax.py
  • refactron/verification/engine.py

Comment on lines +1 to +26
import ast
from typing import Optional
from refactron.verification.engine import BaseCheck, VerificationContext

class SyntaxVerifier(BaseCheck):
def verify(self, original: str, transformed: str, context: Optional[VerificationContext] = None) -> bool:
"""
Verify the syntax of the transformed code.
Uses shared VerificationContext to avoid redundant AST parsing.
"""
# If the VerificationEngine already tried to parse it and failed, the syntax implies it's broken
if context and context.transformed_ast is None and transformed.strip() != "":
return False

# Optional: libcst parsing could also be cached, but for now we fallback
# or rely strictly on ast if libcst isn't required by the context immediately

# Fallback to local parsing for backward compatibility
# (e.g. if run independently without the pre-configured context)
if not context:
try:
ast.parse(transformed)
except Exception:
return False

return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, let's see the full file structure to confirm the duplicate
cat -n refactron/verification/checks/syntax.py | head -50

Repository: Refactron-ai/Refactron_lib

Length of output: 1971


🏁 Script executed:

# Check the BaseCheck abstract class contract
cat -n refactron/verification/engine.py | grep -A 15 "class BaseCheck"

Repository: Refactron-ai/Refactron_lib

Length of output: 585


🏁 Script executed:

# Check the result.py for CheckResult and VerificationResult definitions
cat -n refactron/verification/result.py | head -30

Repository: Refactron-ai/Refactron_lib

Length of output: 1078


🏁 Script executed:

# Check line length issues with the current snippet
python3 << 'EOF'
code = '''import ast
from typing import Optional
from refactron.verification.engine import BaseCheck, VerificationContext

class SyntaxVerifier(BaseCheck):
    def verify(self, original: str, transformed: str, context: Optional[VerificationContext] = None) -> bool:
        """
        Verify the syntax of the transformed code.
        Uses shared VerificationContext to avoid redundant AST parsing.
        """
        # If the VerificationEngine already tried to parse it and failed, the syntax implies it's broken
        if context and context.transformed_ast is None and transformed.strip() != "":
            return False
            
        # Optional: libcst parsing could also be cached, but for now we fallback 
        # or rely strictly on ast if libcst isn't required by the context immediately
        
        # Fallback to local parsing for backward compatibility 
        # (e.g. if run independently without the pre-configured context)
        if not context:
            try:
                ast.parse(transformed)
            except Exception:
                return False
                
        return True'''

for i, line in enumerate(code.split('\n'), 1):
    if len(line) > 100:
        print(f"Line {i} ({len(line)} chars): {line[:100]}...")
EOF

Repository: Refactron-ai/Refactron_lib

Length of output: 318


Merge this fast-path into the existing SyntaxVerifier instead of creating a duplicate.

The existing SyntaxVerifier is at line 12, not line 38. This proposed implementation would shadow it and violates the BaseCheck contract: verify() must accept file_path: Path and return CheckResult, not context: Optional[VerificationContext] and bool. This mismatch causes the linting failures (F811/E402/no-redef) and breaks the verification pipeline.

Additionally, line 6 exceeds the 100-character limit (109 chars).

Please refactor the VerificationContext optimization into the existing class while preserving the correct (original, transformed, file_path: Path) -> CheckResult signature per repo standards.

🧰 Tools
🪛 GitHub Actions: Pre-commit

[error] 15-15: flake8 (E501) line too long (104 > 100 characters)

🪛 Ruff (0.15.9)

[warning] 23-23: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/verification/checks/syntax.py` around lines 1 - 26, The new
fast-path must be merged into the existing SyntaxVerifier without changing the
BaseCheck contract: update the existing SyntaxVerifier.verify method to keep the
signature verify(self, original: str, transformed: str, file_path: Path) ->
CheckResult, and implement the VerificationContext optimization inside that
method by retrieving or accepting a VerificationContext instance from the
surrounding engine (use the existing engine/context helper rather than adding a
new optional parameter); if a cached context indicates transformed_ast is None
and transformed.strip() != "" return a failing CheckResult, otherwise attempt
ast.parse(transformed) and return the appropriate CheckResult on parse failure
or success; also wrap/shorten the long import/type line to <=100 chars and
ensure you reference the types VerificationContext and CheckResult where
required.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
refactron/analysis/symbol_table.py (1)

67-301: ⚠️ Potential issue | 🟡 Minor

CI is currently blocked by formatting hooks.

Pre-commit reports trailing whitespace and black reformatting for this file. Please run formatting hooks before merge to unblock pipeline.

As per coding guidelines, "**/*.py: Use black formatter with target-version set to py38, py39, py310, py311" and "Flake8 linting must use max-line-length of 100".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 67 - 301, The file has
trailing whitespace and needs Black/Flake8 formatting; run the repo pre-commit
hooks (or run black and flake8 manually) to fix whitespace and style issues:
remove trailing whitespace and run black with target-version options for py38,
py39, py310, py311, then run flake8 with max-line-length=100 to apply lint
rules; re-run the tests/CI to ensure hooks pass and commit the reformatted
changes (this affects functions like remove_file, add_symbol, build_for_project,
_save_cache, and _load_cache referenced in the diff).
🧹 Nitpick comments (1)
refactron/analysis/symbol_table.py (1)

183-184: Narrow cache-save exception handling and keep traceback context.

Line 183 catches Exception broadly. Prefer explicit exception classes and structured logging with traceback (exc_info) for diagnosability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 183 - 184, The broad except
in build_for_project (the except Exception as e around the cache save) should be
narrowed and include traceback context: catch only likely I/O/serialization
errors (e.g., OSError/IOError and pickle.PicklingError or the specific
serializer error used) instead of Exception, and log the failure with
exc_info=True (or use logger.exception) so the traceback is preserved; update
the import list if you reference pickle.PicklingError or other specific
exception classes and replace the current logger.warning(...) line with the
narrowed except block and structured logging call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 170-172: The metadata advance is happening regardless of whether
_analyze_file succeeded; modify the flow so that symbol_table.metadata[file_str]
= mtime and updated = True are only executed after a successful run of
_analyze_file (i.e., no exception was raised). Concretely, in the blocks that
call _analyze_file (referencing _analyze_file, file_str, mtime,
symbol_table.metadata, and the updated flag), move the metadata assignment and
updated=True into the success path (or after try/except only if no exception
occurred) and ensure caught exceptions do not update metadata so failed analyses
will be retried later; apply the same change for both occurrences that wrap
_analyze_file (the first block and the block around lines 188-199).
- Around line 67-80: The early return in remove_file prevents cleaning up
self.metadata for files that had no extracted symbols; update remove_file (in
symbol_table.SymbolTable) to always delete metadata for file_path even when
file_path not in self.symbols—either remove the early return and wrap the
symbol-specific cleanup in an if block (if file_path in self.symbols: ...del
self.symbols[file_path]) or keep the early return but perform metadata deletion
before returning (if file_path not in self.symbols: if file_path in
self.metadata: del self.metadata[file_path]; return); preserve the existing
export-cleanup logic for files that do exist in self.symbols.
- Around line 64-65: The metadata cache currently types values as Dict[str,
float] and uses st_mtime (float); change the metadata annotation to Dict[str,
int], replace any os.stat(...).st_mtime reads with os.stat(...).st_mtime_ns so
stored mtimes are integer nanoseconds, and update the deserialization path that
loads saved metadata to convert values to ints (e.g., {k: int(v) for k, v in
data.get("metadata", {}).items()}) so comparisons use integer ns timestamps.
Ensure you update the declaration of the metadata field, the place(s) that
set/check file mtime (use st_mtime_ns), and the load/deserialize code that
reconstructs metadata from persisted data.

---

Outside diff comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 67-301: The file has trailing whitespace and needs Black/Flake8
formatting; run the repo pre-commit hooks (or run black and flake8 manually) to
fix whitespace and style issues: remove trailing whitespace and run black with
target-version options for py38, py39, py310, py311, then run flake8 with
max-line-length=100 to apply lint rules; re-run the tests/CI to ensure hooks
pass and commit the reformatted changes (this affects functions like
remove_file, add_symbol, build_for_project, _save_cache, and _load_cache
referenced in the diff).

---

Nitpick comments:
In `@refactron/analysis/symbol_table.py`:
- Around line 183-184: The broad except in build_for_project (the except
Exception as e around the cache save) should be narrowed and include traceback
context: catch only likely I/O/serialization errors (e.g., OSError/IOError and
pickle.PicklingError or the specific serializer error used) instead of
Exception, and log the failure with exc_info=True (or use logger.exception) so
the traceback is preserved; update the import list if you reference
pickle.PicklingError or other specific exception classes and replace the current
logger.warning(...) line with the narrowed except block and structured logging
call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: df6dac6b-b756-4d21-97d4-82f104f50b8c

📥 Commits

Reviewing files that changed from the base of the PR and between 28f151f and 18c8108.

📒 Files selected for processing (1)
  • refactron/analysis/symbol_table.py

Comment on lines +64 to +65
# Metadata for caching (Map: file_path -> mtime)
metadata: Dict[str, float] = field(default_factory=dict)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify all mtime usage points to ensure precision consistency.
rg -n --type=py -C2 '\bst_mtime\b|\bst_mtime_ns\b'

Repository: Refactron-ai/Refactron_lib

Length of output: 2427


🏁 Script executed:

cat -n refactron/analysis/symbol_table.py | head -80

Repository: Refactron-ai/Refactron_lib

Length of output: 3232


🏁 Script executed:

sed -n '155,170p' refactron/analysis/symbol_table.py

Repository: Refactron-ai/Refactron_lib

Length of output: 647


🏁 Script executed:

sed -n '260,275p' refactron/analysis/symbol_table.py

Repository: Refactron-ai/Refactron_lib

Length of output: 754


🏁 Script executed:

sed -n '295,310p' refactron/analysis/symbol_table.py

Repository: Refactron-ai/Refactron_lib

Length of output: 552


Use st_mtime_ns (integer nanoseconds) instead of st_mtime (float seconds) for reliable cache invalidation.

The current code uses st_mtime (float seconds) and compares with ==, which can miss rapid file edits on filesystems with coarse timestamp resolution (e.g., 1-second granularity on some systems). Switching to st_mtime_ns (integer nanoseconds) provides sub-microsecond precision and avoids floating-point comparison issues.

Update:

  • Line 65: Change metadata: Dict[str, float] to Dict[str, int]
  • Line 159: Change st_mtime to st_mtime_ns
  • Line 301: Convert deserialized values with {k: int(v) for k, v in data.get("metadata", {}).items()}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 64 - 65, The metadata cache
currently types values as Dict[str, float] and uses st_mtime (float); change the
metadata annotation to Dict[str, int], replace any os.stat(...).st_mtime reads
with os.stat(...).st_mtime_ns so stored mtimes are integer nanoseconds, and
update the deserialization path that loads saved metadata to convert values to
ints (e.g., {k: int(v) for k, v in data.get("metadata", {}).items()}) so
comparisons use integer ns timestamps. Ensure you update the declaration of the
metadata field, the place(s) that set/check file mtime (use st_mtime_ns), and
the load/deserialize code that reconstructs metadata from persisted data.

Comment on lines +67 to +80
def remove_file(self, file_path: str) -> None:
"""Remove all symbols associated with a specific file."""
if file_path not in self.symbols:
return

for scope, names in self.symbols[file_path].items():
if scope == "global":
for name, symbol in list(names.items()):
if name in self.exports and self.exports[name].file_path == file_path:
del self.exports[name]

del self.symbols[file_path]
if file_path in self.metadata:
del self.metadata[file_path]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix stale metadata cleanup in remove_file.

At Line 69, the early return prevents metadata cleanup for files that had no extracted symbols. Deleted symbol-less files can remain forever in metadata.

Proposed fix
 def remove_file(self, file_path: str) -> None:
     """Remove all symbols associated with a specific file."""
-    if file_path not in self.symbols:
-        return
-        
-    for scope, names in self.symbols[file_path].items():
-        if scope == "global":
-            for name, symbol in list(names.items()):
-                if name in self.exports and self.exports[name].file_path == file_path:
-                    del self.exports[name]
-                        
-    del self.symbols[file_path]
-    if file_path in self.metadata:
-        del self.metadata[file_path]
+    if file_path in self.symbols:
+        for scope, names in self.symbols[file_path].items():
+            if scope == "global":
+                for name in list(names):
+                    if name in self.exports and self.exports[name].file_path == file_path:
+                        del self.exports[name]
+        del self.symbols[file_path]
+    self.metadata.pop(file_path, None)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def remove_file(self, file_path: str) -> None:
"""Remove all symbols associated with a specific file."""
if file_path not in self.symbols:
return
for scope, names in self.symbols[file_path].items():
if scope == "global":
for name, symbol in list(names.items()):
if name in self.exports and self.exports[name].file_path == file_path:
del self.exports[name]
del self.symbols[file_path]
if file_path in self.metadata:
del self.metadata[file_path]
def remove_file(self, file_path: str) -> None:
"""Remove all symbols associated with a specific file."""
if file_path in self.symbols:
for scope, names in self.symbols[file_path].items():
if scope == "global":
for name in list(names):
if name in self.exports and self.exports[name].file_path == file_path:
del self.exports[name]
del self.symbols[file_path]
self.metadata.pop(file_path, None)
🧰 Tools
🪛 Ruff (0.15.9)

[warning] 74-74: Loop control variable symbol not used within loop body

Rename unused symbol to _symbol

(B007)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 67 - 80, The early return in
remove_file prevents cleaning up self.metadata for files that had no extracted
symbols; update remove_file (in symbol_table.SymbolTable) to always delete
metadata for file_path even when file_path not in self.symbols—either remove the
early return and wrap the symbol-specific cleanup in an if block (if file_path
in self.symbols: ...del self.symbols[file_path]) or keep the early return but
perform metadata deletion before returning (if file_path not in self.symbols: if
file_path in self.metadata: del self.metadata[file_path]; return); preserve the
existing export-cleanup logic for files that do exist in self.symbols.

Comment on lines 170 to +172
self._analyze_file(file_path)
self.symbol_table.metadata[file_str] = mtime
updated = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t advance metadata when analysis fails.

At Line 171, metadata is updated even if _analyze_file failed (Line 197-198 catches and suppresses). That can lock in missing symbols and skip future retries until file mtime changes again.

Proposed fix
-            self._analyze_file(file_path)
-            self.symbol_table.metadata[file_str] = mtime
+            analyzed = self._analyze_file(file_path)
+            if analyzed:
+                self.symbol_table.metadata[file_str] = mtime
+            else:
+                self.symbol_table.metadata.pop(file_str, None)
             updated = True
...
-    def _analyze_file(self, file_path: Path) -> None:
+    def _analyze_file(self, file_path: Path) -> bool:
         """Analyze a single file and populate symbols."""
         try:
             # We use astroid for better inference capabilities later
             tree = self.inference_engine.parse_file(str(file_path))
@@
             # Walk the tree
             self._visit_node(tree, str(file_path), "global")
+            return True
 
         except Exception as e:
             logger.warning(f"Failed to build symbol table for {file_path}: {e}")
+            return False

Also applies to: 188-199

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@refactron/analysis/symbol_table.py` around lines 170 - 172, The metadata
advance is happening regardless of whether _analyze_file succeeded; modify the
flow so that symbol_table.metadata[file_str] = mtime and updated = True are only
executed after a successful run of _analyze_file (i.e., no exception was
raised). Concretely, in the blocks that call _analyze_file (referencing
_analyze_file, file_str, mtime, symbol_table.metadata, and the updated flag),
move the metadata assignment and updated=True into the success path (or after
try/except only if no exception occurred) and ensure caught exceptions do not
update metadata so failed analyses will be retried later; apply the same change
for both occurrences that wrap _analyze_file (the first block and the block
around lines 188-199).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant